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(57) Abstract: The invention provides a method of maintaining geographic data comprising 
the steps of retrieving from a data memory a first group of data comprising one or more data 
sets representing geographic data, each data set comprising one or more data items; compiling 
a second group of data comprising one or more data sets representing geographic data, each 
data set comprising one or more data items; and generating one or more spatial representations 
based on the first and/or second groups of data. The first group of data is preferably obtained 
from a national geographic database and the second group of data is preferably obtained from 
a further data store. The invention also provides a related system and computer program. 
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METHOD AND SYSTEM FOR MAINTAINING GEOGRAPHIC DATA 
FIELD OF INVENTION 

The invention relates to a method and system for maintaining geographic data. 
BACKGROUND TO INVENTION 

It is not uncommon for a person to change residential address and it is not 
uncommon for the same person to change address annually or even more frequently. 
Dwellings are continually altered, for example, old warehouses are turned into 
apartments, existing houses and buildings are demolished and new buildings are 
erected. Local body regulations and rules permit local bodies to create new 
subdivisions, rename existing subdivisions and to make other formal changes to 
addresses. 

In New Zealand, for example, there are approximately 1,290,000 unique addresses. 
1,200,000 of these are stored in one single database. It is estimated that of the 
addresses stored in this database, only 70% are correct. For the reasons stated 
above, it is a difficult task to compile and maintain an accurate and current address 
database. 

The lack of a definitive address database causes difficulties for organisations such as 
postal delivery services in the processing and delivery of maiL S imilar difficulties are 
also experienced by emergency services, for example the fire service, ambulance 
service, and the police. 

SUMMARY OF INVENTION 

In broad terms in one form the invention comprises a method of maintaining 
geographic data comprising the steps of retrieving from a data memory a first group 
of Hata comprising one or more data sets representing geographic data, each data set 
comprising one or more data items; compiling a second group of data comprising one 
or more data sets representing geographic data, each data set comprising one or 
more rfata items; and generating one or more spatial representatiDns based on the 
first and/or second groups of data. 

1 



WO 01/22281 



PCT/NZ00/00184 



In another form in broad terms the invention comprises a geographic data 
m a i nta i nin g system comprising a retrieval device arranged to retrieve from a data 
memory a first group of data comprising one or more data sets representing 
geographic data, each data set comprising one or more data items; a second group of 
5 data comprising one or more data sets representing geographic data, each data set 
comprising 'one or more data items; and a representation generator arranged to 
generate one or more spatial representations based on the first and/ or second 
groups of data. 

10 In another form in broad terms the invention comprises a geographic data 
maintaining computer program comprising a retrieval device arranged to retrieve 
from a data memory a first group of data comprising one or more data sets 
representing geographic data, each data set comprising one or more data items; a 
second- group of data comprising one or more data sets representing geographic data, 

15 each data set comprising one or more data items; and a representation generator 
arranged to generate one or more spatial representations based on the first and 
second groups of data. 
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BRIEF DESCRIPTION OF THE FIGURES 

Preferred forms of the method and system for maintaining geographic data will now 
be described with reference to the accompanying figures in which: 

5 

Figure 1 shows a block diagram of a system in which one form of the invention may 
be implemented; 

Figure 2 shows the preferred system architecture of hardware on which the present 
10 invention may be implemented; 

Figure 3 is an example geographic database; 

Figure - 4 illustrates a preferred method of maintaining data in the geographic 
15 database; 

Figure 5 shows a typical representation generated by the system; and 

Figure 6 illustrates a method of data checking based on matches and partial 
20 matches. 

DETAILED DESCRIPTION OF PREFERRED FORMS 

Figure 1 illustrates a block diagram of the preferred system 10 in which one form of 
25 the present invention 12 may be implemented. The system includes one or more 
clients 20, for example 20A, 20B, 20C, 20D, 20E and 20F, which each may comprise 
a personal computer or workstation described below. Bach client 20 is interfaced to 
the invention 12 as shown in Figure 1. 

30 Each client 20 could be connected directly to the invention 12, could be connected 
through a local area network or LAN, could be connected through the Internet, or 
could be connected through a suitable wireless application protocol or WAP. Clients 
20A and 20B, for example, are connected to a network 22, such as a local area 
network or LAN. The network 22 could be connected to a suitable network server 24 

35 and communicate with the invention 12 as shown. Client 20C is shown connected 
directly to the invention 12. Clients 20D, 20E and 20F are shown connected to the 
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invention 12 through the Internet 26. Client 20D is shown connected to the Internet 
26 with a dial-up connection and clients 20E and 20F are shown connected to a 
network 28 which is in turn connected to a suitable network server 30. 

5 The preferred system 10 comprises a geographic database 36 further described below 
and preferably further comprises one or more other geographic databases. These 
databases could comprise data compiled from two or more different sources. The 
data sources could include a national geographic database 40, a local knowledge 
base for example a 'postie walk book* 50, one or more regional geographic databases 
10 60 and/ or a post code database 70. 

One preferred form of national geographic database 40 is the digital cadastral 
database (DCDB) maintained by Land Information New Zealand (LINZ). The DCDB is 
a computer register containing data on land parcels throughout New Zealand. It 
15 represents the geographic location, shape, area, land appellation and street address 
for each land parcel and the legal definition of roads, road centrelines, railways and 
hydrographic features. The DCDB also contains the definition of statistical 
meshblocks and derived adminis tration boundaries, such as local authorities and 
electoral districts. 

20 

The DCDB has three major components, a spatial component which includes all 
coordinate and graphical information, an attribute component which contains 
descriptive information, and a topology component which contains information on 
topology or connectivity of the graphical data. 

25 

The DCDB contains data obtained by digitising existing large scale cadastral record 
maps and from Electoral Record Maps (ERM) which provide street address and 
unique meshblock i de nt ifi er s. Electoral Record Maps record a house number where 
allocated and a street name as listed in the Authoritative Streets and Places (ASP) 
30 da t a b as e . The ASP database contains listings of street and place names for New 
Zealand with respect to their location within a local authority, or electoral district. 

The DCDB contains street address information in a format conforming with 
standards adopted by individual local authorities. Street names are stored in a 
35 format compatible with the Authoritative Streets and Places (ASP) database compiled 
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for electoral purposes. Building and property names are not recorded in the DCDB 
as there are no standards which have been adopted nationally by local authorities. 

In addition to national geographic database 40, the preferred system could further 
comprise one or more regional geographic databases 60. It is the practice of local 
authorities to compile their own databases from the national geographic database 
and to store additional data in these databases. For example, a local authority may 
approve a new sub-division and store details of the sub-division in a regional 
geographic database. The regional geographic databases may contain data not 
contained in the national geographic database. Likewise, the national geographic 
database may contain data not contained in the regional geographic databases. 

The system 10 may also include a post code database 70. A typical post code 
database contains the data representing individual street name and locations, 
together with the post code corresponding to that street name or location. The 
database 70 may additionally contain details of the post office box number or private 
bag number, together with the post code of that post office box or private bag 
number. Additionally, the database 70 may include the post code corresponding to a 
particular suburb. 

A post code database 70 would be useful, for example, where the geographic co- 
ordinates of an address are not known. Using the post code database 70, the post 
code of the address can be determined and the geographic coordinates of that 
postcode can be determined, giving the approximate geographic coordinates of that 
address. It will be appreciated that the post code database 70 may be substituted for 
an equivalent database, for example a database which stores details of zip codes 
where appropriate. 

The system may also include a local knowledge base 50. It is not uncommon for 
certain groups of people to have detailed records of a particular geographic region. 
One example is postal delivery personnel. Each employee typically delivers mafl to 
recipients on a postal route. Each postal route is recorded manually in a tt postie 
waDcbook*. A walk book typically contains a list of the addresses on a postal route, 
the names of residents on that route and additional information about each address. 
For example, addresses are graded on the basis of delivery difficulty, using a scale of 
1-4, a 1 indicating an easy address to walk or find and 4 being a difficult to walk or 
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find. The postie walk book may also include warnings of hazards, such as aggressive 
animals or residents. The walk books 50 could be stored either manually or 
electronically in a database. 

5 Other groups of people have similar detailed knowledge of particular areas, for 
example taxi drivers, charity collection agencies, door-to-door sales people and 
advertising circular delivery people. The detailed knowledge available to these 
groups of people would greatly increase the accuracy and integrity of the natural 
database if the knowledge could be captured effectively. 

10 

It will be appreciated that the individual databases 36, 40, 50, 60 and 70 
components of the database could be installed on a single standalone computer or 
could be stored on one or more servers accessible over a network or over the 
Internet. Any of the da t a b ases could also be made available to the system 10 
15 through peer-to-peer file sharing in which the invention 12 provides access to 
network addresses for different sets of data. 

One preferred form of the invention 12 comprises a personal computer or 
wo rk sta t i o n operating under the control of appropriate operating and application 
20 software, having a data memory 80 connected to a server 90. The invention is 
arranged to retrieve data from the databases 36, 40, 50, 60 and 70, compare data 
retrieved from these different sources, to display data on a client workstation 20 
and/ or update data in one or more of the databases. 

25 Figure 2 shows the preferred system architecture of a client 20 or invention 12. The 
computer system 100 typically comprises a central processor 102, a main memory 
104 for example RAM, and an input/output controller 106. The computer system 
100 also comprises peripherals such as a keyboard 108, a pointing device 1 10 for 
example a mouse, track ball ox touch pad, a display or screen device 1 12, a mass 

30 storage memory 114 for example a hard disk, floppy disk or optical disc, and an 
output device 116 for example a printer. The system 100 could also include a 
network interface card or controller 1 18 and/or a modem 120. The computer system 
100 could also include WAP communication protocol apparatus. The individual 
components of the system 100 could communicate through a system bus 122. 

35 
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Figure 3 shows a representation of a preferred form geographic database 36. The 
database forming the geographic database 36 could be implemented using a number 
of different products, for example, Oracle, Sybase, Informix, DB2, Microsoft SQL 
Server, or Microsoft Access. The database shown in Figure 3 is a relational database 
5 having a number of records, each record having a number of fields. Each record 
comprises a data set and the data in each field comprises a separate data item. 
Each data set represents a geographic location or street address stored in the 
geographic database. 

10 As shown in Figure 3, the preferred geographic database 36 contains a number of 
different data items in each data set, for example a street number 200, a street name 
202, a street type 204, a suburb 206 and a city 208. It is envisaged that where 
appropriate the geographic database could also include a zip code, post code, state 
and/ of country. Each data set is preferably uniquely identified by a record identifier 

15 210. 

The geographic database may also include geographic co-ordinates. The geographic 
co-ordinates shown in Figure 3 include x co-ordinates 212 and y co-ordinates 214 
representing the geographic position of each street address as a latitude or longitude, 
20 or in a suitable local map co-ordinate system. 

The geographic database is preferably initially created from at least part of the 
n a t i on al geographic database 40. As will be described below the geographic database 
is enriched with data from other sources. 

25 

The term 'street address* as used in the specification includes the geographic 
address of rural areas, public facilities, for example schools and hospitals, and area 
units, for example suburbs and cities. The address of a large area may, for example, 
be stored as the centroid of that large area. It is also envisaged that the geographic 
30 database may include data representing postal boxes and rural delivery points. 

Figure 4 illustrates one method of capturing and maintaining data in accordance 
with the invention. As indicated at 300, one or more data sets representing 
geographic locations are retrieved from, for example, the national geographic 
35 database 40 itself or an initial copy stored in the geographic HataKa<*» 36. A spatial 
representation of the geographic data is generated through the output device 112. 

7 
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The spatial representation is preferably a topographical map. The spatial 
representation could further include data originally sourced from geographic 
database 36, national geographic database 40, regional geographic databases 60, 
post code database 70 and/or walk books 50. 

5 

The spatial representations or maps are supplied to personnel with a detailed 
knowledge of the geographic area, for example, postal workers. As shown at 302, 
each postal worker draws or marks out his or her territory on the map. Each 
territory represents the delivery area for which the postal worker is responsible. 

10 

As indicated at 304, the territory boundaries are stored in the geographic database 
36. Preferably, the territory is represented by a series of boundary points connected 
by lines. The coordinates of each boundary point are determined, and these 
boundary points are stored in the geographic database. The set of boundary points 
15 could be indexed, for example, by a postal worker identifier. 

As indicated at 306, spatial representations or maps of each territory are then 
generated and printed through output device 1 12 as indicated at 308. The territory 
maps are generated, for example, by retrieving the set of boundary points for a 
20 particular postal worker and generating a spatial representation of geographic data 
sets positioned within the territory boundary. 

Territory maps are then supplied to each postal worker who would then ground truth 
the geographic data against their local knowledge and manual records. It is 
25 envisaged that postal workers will mark any changes to addresses on their territory 
maps and these changes could be stored in the geographic database 36. In a similar 
manner, it is envisaged that regular updates from databases 40,50, 60 and 70 would 
be transferred or otherwise m a de available to the geographic database 36, and so 
maps generated from the geographic database will represent the new data. 

30 

Figure 5 illustrates a typical territory map generated from the geographic database. 
The territory of the postal worker is indicated at 400. Postal addresses within the 
territory 400 are indicated for example at 402. 

35 Preferably the postal worker marks the actual route travelled within the territory, for 
example by indicating starting point 404, finish point 406, and the route travelled 
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indicated by a series of directional lines 408. This information could be captured 
and stored in the geographic database 36, for example by storing a set of points 
along a postal route, together with the geographic coordinates of those points, and 
directional vectors connecting each point. It will be appreciated that the storage of 
5 route data is valuable when reorganising a route within a territory to ensure 
optimum efficiency, or when altering the size of the territory and marking a new 
route through the revised territory. 

A preferred^ form of the invention is arranged to compare geographic data from the 
10 geographic database with geographic data from further sources, for example the 
postie walk book(s) 50. One method of matching the respective data sets is described 
in our PCT patent application PCT/NZ00/00148 to Compudigm International 
Limited filed on 3 August 2000 and entitled "Method and system for matching data 
sets* . The preferred method forms an exact or partial match comparison of the data 
15 in the geographic database 36 with the data in, for example, the postie walk book 50. 
The method further comprises the steps of compiling a list of data sets which are 
contained in the geographic database but not the postie walk book and compiling a 
Kst of data sets which are contained in the postie walk book but not the geographic 
database. 

20 

As indicated at 500 in Figure 6, a first group of data comprising one or more data 
sets is retrieved from the geographic database. A match rule is retrieved from a rule 
base as indicated at 502. The match rules are described in more detail in 
PCT/NZ00/00148. These match rules permit address records in the geographic 
25 database 36 to be compared with geographic records in other databases, for example, 
the postie walk book 50. 



The match rules generally specify one or more data items in a data set from the 
geographic database and one or more data items in a data set from another database 
to be compared. Preferably the specified data items from, for example, the 
geographic database data set are concatenated into a single string, and the single 
string is then searched for individual data items from a data set from another 
database. The rule returns a match or partial match if a significant portion of data 
items from the geographic da t a b as e record matches the data items in the other data 
record. The system could return a ranking indicating the extent of the match which 
could also serve as a threshold for the m^tch 

9 
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The order in which the data items appear in the concatenated string is generally 
unimportant, meaning that the system is able to match data sets where data items 
are either missing or specified incorrectly. For example, the suburb data field could 
5 be specified in the city data field, or the data in the suburb field may have been 
transposed with the data in the city field. Matching concatenated data items in this 
way would overcome these difficulties in the user data. 

A second group of data comprising one or more data sets representing geographic 
10 data from another database for example the walk book 50 is then retrieved, for 
example, from the postie walk book 50 as indicated at 504. As indicated at 506, the 
match rule retrieved from a rule base is applied to compare the address in the first 
group of data with the address record in the second group of data. As shown at 508, 
if the match rule is satisfied, the geographic record could be added to a candidate list 
15 as shown at 510. 

As shown at 512, if there is another geographic record to compare, the next 
geographic record is retrieved as indicated at 504. If there is another rule in the rule 
base to apply as indicated at 514, the next match rule is retrieved from the rule base 
20 at 502. 

As shown as 516, if there is another address record in the geographic database to 
check, the address record is retrieved from the geographic database as indicated at 
500. 

25 

As shown at 518, a list is compiled of data sets from the first group of data which do 
not match any data set from the second group of data. In this way, data which is 
stored in the geographic database 36 but is not stored in a postie walk book 50 can 
be allocated to a postal worker to verify any changes or additions to be made to the 
30 postie walk book(s) 50. 

As shown at 520, a list is compiled of data sets from the second group of data which 
do not match any data set from the first group of data. In this way, address data 
which is stored in a postie walk book 50 but is not stored in the national geographic 
35 da t abas e 40 can be used to update the national geographic database 40. 

10 
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In each case, the list could be compiled by identifying the geographic data from the 
geographic database which does not appear in the candidate match list calculated at 
510 or by identifying the geographic data from the postie walk book 50 which does 
not appear in the candidate match list calculated at 510. The lists compiled at 518 
and 520 are preferably stored in memory SO. 

The invention provides a method and system of maintaining a definitive address 
database using data obtained from a number of sources. The invention may be used 
to define and manage postal deliveiy areas and to test out new scenarios for the 
arrangement of the deliveiy areas to maximise efficiency. 

The data in the geographic database can be used as a dynamic information* source 
for day-to-day use of operational and managerial personnel It can be kept current 
on an bn-going basis by ensuring that postal workers carry a current map with them 
and make notes of any address changes so that these changes can be entered into 
the geographic database. 

The geographic database could be made available to various emergency services, for 
example, tfce fire service, ambulance service, and police. The data could also be 
made available to third parties for the purposes of direct marketing opportunities. 
These third parties could include retailers, banking and finan n al institutions, mail 
order firms, communication companies, councils and credit providers. 

The foregoing describes the invention including preferred forms thereof. Alterations 
and mod ifications as wul be obvious to those skilled in the art are intended to be 
incorporated within the scope hereof. 
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CLAIMS: 

1. A method of maintaining geographic data comprising the steps of: 
retrieving from a data memory a first group of data comprising one or more 

5 data sets representing geographic data, each data set comprising one or more data 
items; 

compiling a second group of data comprising one or more data sets 
representing geographic data, each data set comprising one or more data items; and 
generating one or more spatial representations based on the first and/or 
10 second groups of data. 

2. A method of maintaining geographic data as claimed in claim 1 further 
comprising the step of compiling a list of data sets from the first group of data which 
do not "match any data set from the second group of data. 

15 

3. A method of maintaining geographic data as claimed in claim 1 or Hai™ 2 
further comprising the step of compiling a list of data sets from the second group of 
data which do not match any data set from the first group of data. 

20 4. A method of maintaining geographic data as claimed in claim 2 or claim 3 

wherein a first data set matches a second data set if all data items of the first data 
set are members of the second data set. 

5. A method of maintaining geographic data as claimed in any one of the 
25 preceding claims further comprising the step of retrieving the second group of data 

from a manual source. 

6. A method of maintaining geographic data as claimed in any one of Haim* l 
to 4 comprising the step of retrieving the second group of data from a data memory. 

30 

7. A method of m aintaining geographic data as claimed in claim 6 wherein the 
data items of one or more data sets comprise character strings. 

8. A method of maintaining geographic data as claimed in claim 7 further 
35 comprising the steps of concatenating the user data items into a single string and 

performing match comparisons based on string comparisons. 

12 
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9. A method of maintaining geographic data as claimed in any one of the 
preceding claims wherein one or more of the data sets represent street addresses. 

5 10. A method of maintainin g geographic data as claimed in any one of the 
preceding claims wherein one or more of the data sets represent postal addresses. 



11. A geographic data maintaining system comprising: 

a retrieval device arranged to retrieve from a data memory a first group of 
10 data comprising one or more data sets representing geographic data, each data set 
comprising one or more data items; 

a second group of data comprising one or more data sets representing 
geographic data, each data set comprising one or more data items; and 

- a representation generator arranged to generate one or more spatial 
15 representations based on the first and/or second groups of data. 

12. A system as claimed in claim 1 1 further comprising a list of data sets from 
the first group of data which do not match any data sets from the second group of 
data. 

20 

13. A system as claimed in claim 1 1 further comprising a list of data sets from 
the second group of data which do not match any data set from the first group of 
data. 



25 14. A system as claimed in claim 12 or claim 13 wherein a first data set 
matches a second data set if all data items of the first data set are members of the 
second data set. 



15. A system as claimed in any one of claims 11 to 14 wherein the second group 
30 of data is compiled from a Tps ^ r mfr l source. 

16. A system as claimed in any one of claims 1 1 to 14 wherein the second group 
of data is retrieved from a data memory. 

35 17. A system as claimed in claim 16 wherein the data items of one or more data 
sets c ompri se character strings. 
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18. A system as claimed in claim 17 further comprising a data matcher arranged 
to concatenate data items into a single string and to compare data items based on 
string comparisons. 

19. A system as claimed in any one of claims 1 1 to 18 wherein one or more of 
the data sets represent street addresses. 

20. A system as claimed in any one of claims 11 to 19 wherein one or more of 
the data sets represent postal addresses. 



21. A geographic data maintaining computer program comprising: 

a retrieval device arranged to retrieve from a data memory a first group of 
data comprising one or more data sets representing geographic data, each data set 
15 comprising one or more data items; 

a second group of data comprising one or more data sets representing 
geographic data, each data set comprising one or more data items; and 

a representation generator arranged to generate one or more spatial 
representations based on the first and second groups of data. 

20 

22. A computer program as claimed in claim 2 1 further comprising a list of data 
sets from the first group of data which do not match any data set from the second 
group of data. 

25 23. A computer program as claimed in claim 2 1 or rbiim 22 further comprising a 
list of data sets from the second group of data which do not match any data set from 
the first group of data. 



24. A computer program as claimed in claim 22 or claim 23 wherein a first data 
set matches a second data set if all data items of the first data set are members of 
the second data set. 

25. A computer program as claimed in any one of claims 21 to 24 wherein the 
second group of data is compiled from a ma^ia] source. 



35 
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26. A computer program as claimed in any one of claims 21 to 24 wherein the 
second group of data is retrieved from a data memory. 

27. A computer program as claimed in claim 26 wherein the data items of one or 
more data sets comprise character strings. 

28. A computer program as claimed in claim 27 further comprising a string 
comparator arranged to concatenate the data items into a single string and match 
each data item to other data items based on string comparisons. 

29. A computer program as claimed in any one of claims 2 1 to 28 wherein one or 
more of the data sets represent street addresses. 

30. • A computer program as claimed in any one of claims 21 to 29 wherein one or 
more of the data sets represent postal addresses. 

31. A computer program as claimed in any one of claims 21 to 30 embodied on a 
computer-readable medium. 
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